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Abstract. In this paper we propose a new stochastic model based on a generalization 
of semi-Markov chains to study the high frequency price dynamics of traded stocks. 
We assume that the financial returns are described by a weighted indexed semi-Markov 
chain model. We show, through Monte Carlo simulations, that the model is able to 
reproduce important stylized facts of financial time series as the first passage time 
distributions and the persistence of volatility. The model is applied to data from 
Italian and German stock market from 1 January 2007 until the end of December 
2010. 
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1. Introduction 

Semi-Markov processes (SMP) are a wide class of stochastic processes which generalize 
at the same time both Markov chains and renewal processes. The main advantage 
of SMP is that they allow the use of whatever type of waiting time distribution for 
modeling the time to have a transition from one state to another one. On the contrary, 
Markovian models have constraints on the distribution of the waiting times in the states 
which should be necessarily represented by memoryless distributions (exponential or 
geometric for continuous and discrete time cases respectively). This major flexibility 
has a price to pay: the parameters to be estimated are more numerous. 

Semi-Markov processes (SMP) generalizes also non-Markovian models based on 
continuous time random walks extensively used in the econophysics community, see for 
example [U E]. SMP have been used to analyze financial data and to describe different 
problems ranging from credit rating data modeling |3j to the pricing of options jU [5] . 

With the financial industry becoming fully computerized, the amount of recorded 
data, from daily close all the way down to tick-by-tick level, has exploded. Nowadays, 
such tick-by-tick high-frequency data are readily available for practitioners and 
researchers alike [HJ [7] . It seemed then natural to us trying to verify the semi-Markov 
hypothesis of returns on high-frequency data, see [8] . In [8J we proposed a semi-Markov 
model showing its ability to reproduce some stylized empirical facts such for example 
the absence of autocorrelations in returns and the gain/loss asymmetry. In that paper 
we showed also that the autocorrelation in the square of returns is higher with respect to 
the Markov model. Unfortunately this autocorrelation was still too small compared to 
the empirical one. In order to overcome the problem of low autocorrelation, in another 
paper [9] we proposed an indexed semi-Markov model for price return. More precisely 
we assumed that the intraday returns (up to one minute frequency) are described by a 
discrete time homogeneous semi-Markov process where we introduced a memory index 
which takes into account the periods of different volatility in the market. It is well 
known that the market volatility is autocorrelated, then periods of high (low) volatility 
may persist for long time. We made the hypothesis that the kernel of the semi-Markov 
process depend on which level of volatility the market is at that time. It is to be remarked 
that the weighted memory index is a stochastic process which do depend on the same 
Markov Renewal Chain (J n ,T n ) to which the semi-Markov chain is associated. Then, 
in our model, the high autocorrelation is obtained endogenously without introducing 
external or latent auxiliary stochastic processes. To improve further our previous results, 
in this work, we propose an exponentially weighted index which will be described in the 
following. 

The database used for the analysis is made of high frequency tick-by-tick price data 
from all the stock in Italian and German stock market from first of January 2007 until 
end of December 2010. From prices we then define returns at one minute frequency. 

The plan of the paper is as follows. In Section 2 we define the weighted indexed 
semi-Markov chain model with memory and we explain how to perform a Monte Carlo 
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simulation of its trajectory. In Section 3, we present the empirical results deriving from 
the application of our model to real stock market data. Finally, in Section 4 we present 
our conclusion. 

2. The Weighted-Indexed Semi-Markov Model 

In this section we propose a generalization of the semi-Markov process that is able to 
represent higher-order dependencies between successive observations of a state variable. 
One way to increase the memory of the process is by using high-order semi-Markov 
processes as defined in [10] and more recently reviseted and extended in a discrete 
time framework in [llj. A more parsimonious model has been defined by [12] and it is 
showed that it describes appropriately important empirical regularities of financial time 
series. In this paper we propose a further improvement of the indexed semi-Markov 
chain model proposed in reference [12] named Weighted-Indexed Semi-Markov Chain 
(WISMC) model which allows the possibility of reproducing long-term dependence in 
the square of stock returns in a very efficient way. 

Let us assume that the value of the financial asset under study is described by the 
time varying asset price S(t). The return at time t calculated over a time interval of 
lenght 1 is defined as S ^ t+ s^ S ^ ■ The return process changes value in time, then we 
denote by {J„} ne iN the stochastic process with finite state space E = {1,2, ...,s} and 
describing the value of the return process at the n-th transition. 

Let us consider the stochastic process {T n } n6]N with values in IN. The random 
variable T n describes the time in which the n-th transition of the price return process 
occurs. 

Let us consider also the stochastic process {U^} n ^ with values in 1R. The random 
variable C/„ describes the value of the index process at the n-th transition. 

In reference [9] the process {U n } was defined as a reward accumulation process 
linked to the Markov Renewal Process {J n ,T n }; in [9] the process {U n } was defined as 
a moving average of the reward process. Here, motivated by the application to financial 
returns, we consider a more flexible index process defined as follows: 

n— 1 T n _ k — 1 

U n=J2 E f(Jn-l-k,a,\), (1) 
k=0 a=T n _ 1 _ k 

where f : E x IN x 1R — )■ IR is a Borel measurable bounded function and Uq is known 
and non-random. 

The process can be interpreted as an accumulated reward process with the 
function / as a measure of the weighted rate of reward per unit time. The function / 
depends on the current time a, on the state J n _i_fc visited at current time and on the 
parameter A that represents the weight. 

In next section a specific functional form of / will be selected in order to produce a real 
data application. 
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To construct the WISMC model we have to specify a dependence structure between 
the variables. Toward this end we adopt the following assumption: 

P[J n+1 = j, T n+1 -T n < t\a{J h , T h , U*), h = 0,...,n,J n = i, f/ n A = v] 
= nJn+i = J, T n+1 - T n <t\J n = i, V x n = v] := Qij(v; t), 

where a(Jh,Th, Ufc), h < n is the natural filtration of the three-variate process. 

The matrix of functions Q A (t>;t) = (Qfj(v; t))ij e E has a fundamental role in 
the theory we are going to expose, in recognition of its importance, we call it 
weighted-indexed semi-Markov kernel. 

The joint process ( J n , T n ) depends on the process U x , the latter acts as a stochastic 
index. Moreover, the index process t/ A depends on (J n ,T n ) through the functional 
relationship ([!]). 

Observe that if 

F[J n+1 = j, T n+1 -T n <t\J n = i, = v]= F[J n+1 = j, T n+1 -T n <t\J n = i] 

for all values v 6 1R of the index process, then the weigthed indexed semi-Markov 
kernel degenerates in an ordinary semi-Markov kernel and the WISMC model becomes 
equivalent to classical semi-Markov chain model as presented for example in [TJ] and 

nu. 

The triple of processes {J n ,T n , U^} describes the behaviour of the system only in 
correspondence of the transition times T n . To describe the behavior of our model at 
whatever time t which can be a transition time or a waiting time, we need to define 
additional stochastic processes. 

Given the three-dimensional process { J n , T n , U^} and the weighted indexed semi- 
Markov kernel Q A (t> ; t), we define by 

N{t) = sup{n e N : T n < t}; 
Z(t) = Jjv(t); 

N(t)-i+e (t/\T N(t)+g _ k )-i (3) 
C/A W= Yl Yl f{JN(t)+9-i-k,a,X), 

k=0 a=7jv( t )+e-i-fc 

where 9 = l{t>r„ (t) }. 

The stochastic processes defined in ^ represent the number of transitions up to 
time t, the state of the system (price return) at time t and the value of the index process 
(weighted moving average of function of price return) up to t, respectively. We refer to 
Z(t) as a weighted indexed semi-Markov process. 

The process U x (t) is a generalization of the process £/ A where time t can be a 
transition or a waiting time. It is simple to realize that if t — T n we have that 
U\t) = ul 

Let 

pUv) :=F[J n+1 =j\J n = i,U x = v], 
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be the transition probabilities of the embedded indexed Markov chain. It denotes the 
probability that the next transition is in state j given that at current time the process 
entered in state i and the index process is equal to v. It is simple to realize that 

pf J (v)=\imQf J (v;t). (4) 
Let H?(v;-)be the sojourn time cumulative distribution in state i G E: 

H*(v; 1) := P[T n+1 - T n < t\ J n = i, = v] = J2 QU v '> *)• ( 5 ) 

It expresses the probability to make a transition from state i with sojourn time less 
or equal to t given the indexed process is v. 

The conditional waiting time distribution function G expresses the following 
probability: 

G%(v\ t) := P[T n+1 - T n < t \ J n = i, J n+1 = j, = v\. (6) 
It is simple to establish that 

GUv;t) = \ p&W Jt W^ u (7) 
I 1 ifp&(«) = 0. 

In the papers [9] and [8] explicit renewal-type equations were given to describe the 
probabilistic behavior of the indexed semi-Markov chain. It is possible to derive similar 
results for the WISMC model but here we prefer to not report these results applied to 
our model because, in the implementation of the model given in the next section, we 
follow a Monte Carlo simulation based approach. Monte Carlo methods are very useful 
for simulating the system behavior and represent a way of generate replicable results. 
In the following we give a Monte Carlo algorithm in order to simulate a trajectory of 
a given WISMC in the time interval [0, T]. The algorithm consists in repeated random 
sampling to compute successive visited states of the random variables {Jo, J\, ...}, the 
jump times {T ,Ti, ...} and the index values {Uq, t/f, ...} up to the time T. 
The algorithm consists of 5 steps: 

1) Set n = 0, J = i, T = 0, Uq = v, horizon time= T; 

2) Sample J from pj n and set J n+ i = J(u); 

3) Sample W from G x Jn Jn+i (U^ ■) and set T n+1 = T n + W{u); 

4) set o*.! = ELo E&T 1 /( J -*' a > A ); 

5) If T n+ i > T stop 

else set n = n + 1 and go to 2). 

3. Empirical results 

In the following we show as our model performs comparing its statistical features and 
those of real data returns. The comparison is done by means of Monte Carlo simulations 
according to the algorithms described in the previous section. 
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Figure 1. Discretization of returns 

For our analysis we choose 4 stocks from two databases of tick-by-tick quotes of 
real stocks from the Italian Stock Exchange ("Borsa Italiana") and the German Stock 
Exchange ("Deutsche Borse"). The chosen stocks are ENI and FIAT from the Italian 
database and Allianz and VolksWagen from the German database. The period used goes 
from January 2007 to December 2010 (4 full years). The data have been re-sampled to 
have 1 minute frequency. The number of returns analyzed is then roughly 500 * 10 3 for 
each stock. 

To be able to model returns has a semi-Markov process the state space has been 
discretized. In the 4 examples shown in this work, we discretized returns into 5 states 
chosen to be symmetrical with respect to returns equal zero and to keep the shape of 
the distribution unchanged. Returns are in fact already discretized in real data due to 
the discretization of stock prices which is fixed by each stock exchange and depends 
on the value of the stock. Just to make an example, in the Italian stock market for 
stocks with value between 5.0001 and 10 euros the minimum variation is fixed to 0.005 
euros (usually called tick). We then tried to remain as much as possible close to this 
discretization. In Figure [T] we show an example of the discretization of the returns of 
one of the analyzed stocks. The model described in the previous section requires the 
specification of a function / in the definition of the weighted index in ([l]). Let us 
briefly remind that volatility of real market is long range positively autocorrelated and 
then clustered in time. This implies that, in the stock market, there are periods of high 
and low volatility. Motivated by this empirical facts we suppose that also the transition 
probabilities depends on whether the market is in a high volatility period or in a low one. 
In a previous work [H] , for simplicity reason, we used a moving average of the squares of 
returns as the index variable U. In that case we imposed that the index depended only 
on a memory m which was the number of transitions in the past used for the moving 
average. In this work we decided to use a more appropriate expression for /. We use 
an exponentially weighted moving average (EWMA) of the squares of returns which as 
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Figure 2. Discretization of index values 
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the following expression: 

f(Jn-i-k,a,X) = -_ t , 

2^fc=0 -Ea=T,„ 

and consequently the index process becomes 

\T n -a j2 
A J n-l-k 



\T n -a 



u. 



n— 1 Tn-fc — 1 

E E 



(8) 



(9) 



fc=0 a=T„_!_ fc \Z^fc=0 Z-«j=T n _ x _ fc 

The index [7 A was also discretized into 5 states of low, medium low, medium, medium 
high and high volatility. An example of the discretization used in the analysis is shown 
in Figure [2j 

A very important feature of stock market data is that, while returns are uncorrelated 
and show an i.i.d. like behavior, their square or absolute values are long range correlated. 
It is very important that theoretical models of returns reproduce this features. We 
then tested our model to check whether it is able to reproduce such behavior. Given 
the presence of the parameter A in the index function, we tested the autocorrelation 
behavior as a function of A. Note that in the definition of the index variable the EWMA 
is performed over all the previous square of returns each with its weight. Before summing 
over all past returns we decided to check whether a better memory time m exists. For 
this reason we checked our model also against this other parameter. With this choice 
formula (|9l) takes the form: 



n-l 



T„.-k — t 



KM 



E E 



A 



T n -a T 2 
■A 



n— 1— k 



En— 1 sr^T n _ k -l 
k=n—mZ—da=T r ,_T_ 



\T n -a 



(10) 



k=n—m a=T n _i_fc 

We remind the definition of the autocorrelation function: if Z indicates returns, the 
time lagged (r) autocorrelation of the square of returns is defined as 

N Cov(Z 2 (t + r),Z 2 (t)) 



Var(Z 2 (t)) 



11 
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Figure 3. Mean square error between autocorrelation functions from real and 
simulated data as functions of m and for different values of A. 



We estimated S(r) for real data and for returns time series simulated with different 
values of the memory time m and the weights A. The time lag r was made to run from 
1 minute up to 100 minutes. Note that to be able to compare results for £(t) each 
simulated time series was generated with the same length as real data. In Figure [3] 
we show the mean square error between E(r) obtained from real and simulated returns 
(using definition (10) for the index process) for the four stocks analyzed and for different 
m and A. Let us make some considerations on the results shown in Figure |3j m should be 
chosen as big as possible and then definition ^ is appropriate as far as A is chosen less 
than 1, in fact, in this last case definition ^ becomes equivalent to a moving average 
without weights and results presented in j9] hold for m. In Figure [4] we show again 
the mean square error but only as a function of the weights A then using definition ^ 
for the index process. We can notice that the behavior is very similar for the different 
analyzed stocks even if the best value for A is not the same for all of them. As it is 
possible to see the best values of A for the stocks Fiat, Eni, Allianz and VolksWagen are 
0.96, 0.97, 0.97 and 0.98, respectively. 

The comparison between the autocorrelations for the best values of A for each stock 
and real data is shown in Figure [5] This figure shows that real and synthetic data have 
almost the same autocorrelation function for the square of returns. 
We tested our model also to verify if it is able to reproduce the feature shown by real 
data regarding the first passage time (fpt) distribution [HI [151 [IB]- Let us remind here 
the definition of fpt: the fpt for an investment made at time t at price S(t) is defined 
as the time interval r = t' — t, t' > t where the relation S(t + r)/S(t) > p is fulfilled for 
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Figure 4. Mean square error between autocorrelation functions from real and 
simulated data as functions of A. 




Figure 5. Autocorrelation functions of real data (solid line) and synthetic (dashed 
line) time scries for the analyzed stocks. 
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Figure 6. First passage time distribution of real data (solid line) and synthetic (dashed 
line) time series for the analyzed stocks. 



the first time. We will denote the fpt as T p (t). Then 

T p (t) = min{r > 0; S(t + r)/S(t) > p}. 

In j8] we have shown how to calculate analytically such distribution for a semi-Markov 
process then we will not repeat that here. Using the best values for A for each stock 
Fiat, Eni, Allianz and VolksWagen and choosing a value p = 1.005 for all of them we 
compare in Figure [6] results for the first passage time distribution for each stock. It can 
be noticed that they are almost identical improving the results obtained for a simple 
semi-Markov process presented in [8]. 

The results obtained here improve those obtained in our previous work [9j [8] even 
further showing that the semi-Markov approach is adequate to model high frequency 
financial time series. 

4. Conclusions 

We have modeled financial price changes through a semi-Markov model where we have 
added a weighted index. Our work is motivated by two main results: the existence in 
the market of periods of low and high volatility and our previous work [9], where we 
showed that an indexed semi-Markov model, is able to capture almost all the correlation 
in the square of returns present in real data. The results presented here show that the 
semi-Markov kernel is influenced by the past volatility and that its influence decreases 
exponentially with time. In fact, if the past volatility is used as an exponentially 
weighted index, the model is able to reproduce almost exactly the behavior of market 
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returns: the returns generated by the model are uncorrected while the square of returns 
present a long range correlation very similar to that of real data. 

We have also shown, by analyzing different stocks from different markets (Italian 
and German), that results do not depend on the particular stock chosen for the analysis 
even if the value of the weights may depends on the stock. 

We stress that out model is very different from those of the ARCH/GARCH family. 
We do not model directly the volatility as a correlated process. We model returns and 
by considering the semi-Markov kernel conditioned by a weighted index the volatility 
correlation comes out freely. 
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